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DETAILED ACTION 



Claim Rejections - 35 USC § 112 



The following is a quotation of the first paragraph of 35 U.S.C. 1 12: 

The specification shall contain a written description of the invention, and of the manner and process of making 
and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode 
contemplated by the inventor of carrying out his invention. 

1 . Claim 23 is rejected under 35 U.S.C. 112, first paragraph, as failing to comply with the 
written description requirement. The claim(s) contains subject matter which was not described 
in the specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s), at the time the application was filed, had possession of the claimed invention. 
The specification, as originally filed, fails to teach a "numeric recognition processor", as claimed 
in claim 23. The specification simply discloses a numeric understanding processor (page 2, line 
16, page 4, line 17 and Figure 1, element 20). 



The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by another filed 
in the United States before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, except that an 
international application filed under the treaty defined in section 351(a) shall have the effects for purposes of this 
subsection of an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 



Claim Rejections - 35 USC § 102 
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2. Claim 28 is rejected under 35 U.S.C. 102(e) as being anticipated by Alleva et al. (US 
Patent No. 5,970,449). 

3. Regarding claim 28, Alleva et al teaches a system for text normalization in which the 
output of a speech recognizer is processed to produce a representation of the appropriate digits. 
Alleva describes the speech recognition processor that produces textual output corresponding to 
recognized portions of input speech, such that the recognizer produces text such as "ten cents" 
and "four o'clock in the afternoon", which reads on "a speech recognition processor that receives 
unconstrained input speech and outputs a string of words, the speech recognition processor being 
based on a numeric language that represents a subset of a vocabulary, the subset including a set 
of words identified as being relevant for interpreting and understanding number strings," since 
the words ten, cents, four and o'clock are the vocabulary words of numeric language that are 
relevant for interpreting and understanding number strings related to currency and time (col. 3, 
line 18 to col. 4, line 6; Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56- 
62; col. 6, lines 14-17 and 40-42; col. 5, lines 62-65 and col. 6, lines 32-64); 

At col. 6, lines 14-64, Alleva et al describes the rules the text normalizer (element 38, 
Figures 3A-3E) implements to process the string of words received from the speech recognizer to 
generate a sequence of corresponding digits, which reads on "a numeric understanding processor 
containing classes of rules for converting the string of words into a sequence of digits." 
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Claim Rejections - 35 USC §103 



The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 



obviousness rejections set forth in this Office action: 



(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 



4. Claims 17-19, 21-27, 29-34, and 36 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Alleva et al (US Patent No. 5,970,449) in view of Sukkar (US Patent No. 
5,613,037). 

5. Regarding claim 1 7, Alleva at al teaches 

receiving a speech signal at col. 3, line 18 to col. 4, line 6; 

Alleva describes the speech recognition processor that produces textual output 
corresponding to recognized portions of input speech, such that the recognizer produces text such 
as "ten cents" and "four o'clock in the afternoon," which reads on "performing speech 
recognition process on the received speech signal to produce speech recognition results, wherein 
a numeric language includes a subset of a vocabulary, the subset of the vocabulary including 
words that identify digits in number strings and words that enable the interpretation and 
understanding of number strings," since the words ten, cents, four and o'clock are the 
vocabulary words of numeric language that are relevant for interpreting and understanding 
number strings related to currency and time (col. 3, line 18 to col. 4, line 6; Abstract; Figure 1, 
element 32; Figure 9, element 132; col. 1, lines 56-62; col. 6, lines 14-17 and 40-42; col. 5, lines 
62-65 and col. 6, lines 32-64); 
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At col. 6, lines 14-64 and Figure 9, elements 122, 124, 126, 128, and 130, Alleva et al 
describes the rules the text normalizer (element 38, Figures 3A-3E) implements to process the 
string of words received from the speech recognizer to generate a sequence of corresponding 
digits, which reads on "generating a sequence of digits using said speech recognition results, said 
generating being based on a set of rules." 

Alleva fails to explicitly teach a system comprising acoustic models utilized by the 
speech recognition processor. However, implementation of acoustic models in a speech 
recognition system was well known in the art. 

In a similar field of endeavor, Sukkar discloses a speech recognition system comprising 
acoustic model, utilized by the speech recognition processor (Figure 3, element 308). 
Additionally, Sukkar teaches a digit model for digit recognition and a second model, a filler 
model, a generalized HMM model of spoken words that do not contain digits (col. 3, line 19 to 
col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 18, Alleva teaches the speech recognition processor at Figure 1, element 
32; col. 3, line 18 to col. 4, line 6. 

Regarding claim 19, Alleva does not teach that the recognition process on a set of 
acoustical models that has been defined for other words in the vocabulary. 

Sukkar discloses a speech recognition system comprising acoustic model, utilized by the 
speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
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model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 21, Alleva teaches the speech recognition processor that produces 
textual output corresponding to recognized portions of input speech, such that the recognizer 
produces text such as "ten cents," "April first nineteen ninety seven," "Seattle Washington nine 
eight zero five two" and "four o'clock in the afternoon," which reads on "numeric language 
includes digits, natural numbers, alphabets, and city/country name classes," since the words ten, 
cents, April, Seattle, Washington, four and o'clock are the vocabulary words of numeric 
language that are relevant for interpreting and understanding number strings related to classes of 
digits, natural numbers, alphabets, and city/country name (col. 3, line 18 to col. 4, line 6; 
Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56-62; col. 6, lines 14-17 and 
40-42; col. 5, lines 62-65 and col. 6, lines 32-64). 

Alleva does not teach the numeric language includes a re-starts class. At col. 5, line 48- 
52, Sukkar discloses implementation of a misrecognition classifier, so as to account for the errors 
during recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement words in the numeric language related to recognition errors to account for errors 
during the recognition process, as suggested by Sukkar, for the purpose of providing reliable and 
accurate recognition and thereby improve system performance. 
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Regarding claim 22, Alleva does not explicitly teach acoustic models are hidden Markov 
models. Sukkar discloses a speech recognition system comprising acoustic model, utilized by 
the speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 23, Alleva does not teach a numeric recognition processor. Sukkar 
discloses a speech recognition system comprising acoustic model, utilized by the speech 
recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit model for 
digit recognition and a second model, a filler model, a generalized HMM model of spoken words 
that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement a numeric recognition processor as taught by Sukkar in the recognition system of 
Alleva, for the purpose of accurately producing vector representations of recognized numbers of 
the received input speech. 

Regarding claims 24 and 26-27, Alleva teaches a set of rules includes a naturals rule, a 
restarts rule, a city/country rule, and a alphabets rule at Figure 9, element 126 and col. 6, line 3 to 
col. 7, line 9. 
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Regarding claim 25, Alleva does not teach the set of rules includes re-starts rules. At col. 
5, line 48-52, Sukkar discloses implementation of a misrecognition classifier, so as to account for 
the errors during recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement process or normalize the words in the numeric language output from the speech 
recognizer that are related to recognition errors to account for errors during the recognition 
process, as suggested by Sukkar, for the purpose of providing reliable and accurate recognition 
and thereby improve system performance. 

Regarding claim 29, Alleva fails to explicitly teach a system comprising acoustic models 
utilized by the speech recognition processor. However, implementation of acoustic models in a 
speech recognition system was well known in the art. 

Sukkar discloses a speech recognition system comprising acoustic model, utilized by the 
speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 30, Alleva fails to explicitly teach a first set of hidden Markov models 
that characterize acoustic features of words in the numeric language and a second set of hidden 
Markov models that characterize acoustic features of words in the remainder of the vocabulary. 
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Sukkar teaches a digit model for digit recognition and a second model, a filler model, a 
generalized HMM model of spoken words that do not contain digits (col. 3, line 19 to col. 4, line 
22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic hidden Markov model teachings of Sukkar in the recognition system 
of Alleva, for the purpose of accurately producing vector representations of the received input 
speech. 

Regarding claim 31, Alleva fails to explicitly teach a filler model that characterizes out of 
vocabulary features. 

Sukkar teaches a digit model for digit recognition and a second model, a filler model, a 
generalized HMM model of spoken words that do not contain digits (col. 3, line 19 to col. 4, line 
22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic hidden Markov model teachings of a filler model, as suggested by 
Sukkar in the recognition system of Alleva, for the purpose of accurately producing vector 
representations of the received input speech to accurately distinguish numeric input from other 
speech input. 

Regarding claim 32, Alleva fails to teach an utterance verification processor. At col. 5, 
lines 44-52, Sukkar describes a digit/non-digit classification that identifies speech containing 
valid digits, speech not containing a digit and speech containing misrecognitions. Sukkar 
teaches the misrecognitions are identified as non-digits so that errors can be rejected and not 
classified as valid digit data. 
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Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva to implement utterance verification as taught by Sukkar, for the 
purpose of ensuring that only valid digit information is recognized and classified as actual digit 
speech. 

Regarding claim 33, Alleva does not teach a validation database or a string validation 
processor. At col. 7, lines 6-49, Sukkar describes candidate string validation based on individual 
candidate digit confidence scores that are determined using a digit vocabulary set of the digit 
models. Sukkar teaches the string validation is implemented so that errors in the string cause the 
string to be rejected, which is desirable for many applications. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva to implement string validation as taught by Sukkar, for the 
purpose of ensuring that only valid digit information is accepted and applications using the 
system process and operate with valid data. 

Regarding claim 34, at col. 8, lines 28-38, Alleva teaches the normalizer normalizes the 
text and a speech API forwards the normalized content to a application program, which reads on 
"a dialogue manager processor that initiates an action based on the validity information." 

Regarding claim 36, Alleva teaches a set of rules includes a naturals rule, a restarts rule, a 
city/country rule, and a alphabets rule at Figure 9, element 126 and col. 6, line 3 to col. 7, line 9. 
Alleva does not teach the set of rules includes a re-starts rule. At col. 5, line 48-52, Sukkar 
discloses implementation of a misrecognition classifier, so as to account for the errors during 
recognition. 
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Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement process or normalize the words in the numeric language output from the speech 
recognizer that are related to recognition errors to account for errors during the recognition 
process, as suggested by Sukkar, for the purpose of providing reliable and accurate recognition 
and thereby improve system performance. 

6. Claims 13-16, 20 and 35 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Alleva et al (US Patent No. 5,970,449) in view of Sukkar (US Patent No. 5,613,037) and further 
in view of Huang et al (US Patent No. 5,937,384). 

7. Regarding claim 13, Alleva describes the speech recognition processor that produces 
textual output corresponding to recognized portions of input speech, such that the recognizer 
produces text such as "ten cents" and "four o'clock in the afternoon," which reads on "a speech 
recognition method comprising, defining a numeric language, the numeric language including a 
subset of a vocabulary, the subset of the vocabulary including words that identify digits in 
number strings and words that enable the interpretation and understanding of number strings," 
since the words ten, cents, four and o'clock are the vocabulary words of numeric language that 
are relevant for interpreting and understanding number strings related to currency and time (col. 
3, line 18 to col. 4, line 6; Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56- 
62; col. 6, lines 14-17 and 40-42; col. 5, lines 62-65 and col. 6, lines 32-64); 

Alleva does not teach a set of acoustic models for the numeric language, a second set of 
acoustical models that has been defined for other words in the vocabulary or storing the first and 
second set of acoustical models in an acoustic model database. 
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Sukkar discloses a speech recognition system comprising acoustic model, utilized by the 
speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Alleva and Sukkar do not implement a first quality level for the first acoustic models and 
a second quality level for the second acoustic models. Huang teaches a method and system for 
speech recognition using continuous density hidden Markov models, which implements context- 
dependent HMMs and context-independent HMMs and teaches that the use of both types of 
HMMs is beneficial in achieving an improved recognition accuracy (col. 6, lines 18-38). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva and Sukkar to implement both context-dependent HMMs and 
context-independent HMMs, as by Huang, for the purpose of achieving an improved recognition 
accuracy, as suggest by Huang. 

Regarding claim 14, Alleva teaches the speech recognition processor that produces 
textual output corresponding to recognized portions of input speech, such that the recognizer 
produces text such as "ten cents," "April first nineteen ninety seven," "Seattle Washington nine 
eight zero five two" and "four o'clock in the afternoon," which reads on "numeric language 
includes digits, natural numbers, alphabets, and city/country name classes," since the words ten, 
cents, April, Seattle, Washington, four and o'clock are the vocabulary words of numeric 
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language that are relevant for interpreting and understanding number strings related to classes of 
digits, natural numbers, alphabets, and city/country name (col. 3, line 18 to col. 4, line 6; 
Abstract; Figure 1, element 32; Figure 9, element 132; col. 1, lines 56-62; col. 6, lines 14-17 and 
40-42; col. 5, lines 62-65 and col. 6, lines 32-64). 

Alleva does not teach the numeric language includes a re-starts class. At col. 5, line 48- 
52, Sukkar discloses implementation of a misrecognition classifier, so as to account for the errors 
during recognition. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement words in the numeric language related to recognition errors to account for errors 
during the recognition process, as suggested by Sukkar, for the purpose of providing reliable and 
accurate recognition and thereby improve system performance. 

Regarding claim 15, Alleva does not explicitly teach acoustic models are hidden Markov 
models. Sukkar discloses a speech recognition system comprising acoustic model, utilized by 
the speech recognition processor (Figure 3, element 308). Additionally, Sukkar teaches a digit 
model for digit recognition and a second model, a filler model, a generalized HMM model of 
spoken words that do not contain digits (col. 3, line 19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic model teachings of Sukkar in the recognition system of Alleva, for the 
purpose of accurately producing vector representations of the received input speech. 

Regarding claim 16, Alleva fails to explicitly teach a filler model that characterizes out of 
vocabulary features. Sukkar teaches a digit model for digit recognition and a second model, a 
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filler model, a generalized HMM model of spoken words that do not contain digits (col. 3, line 
19 to col. 4, line 22). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to implement the acoustic hidden Markov model teachings of a filler model, as suggested by 
Sukkar in the recognition system of Alleva, for the purpose of accurately producing vector 
representations of the received input speech to accurately distinguish numeric input from other 
speech input. 

Regarding claim 20, Alleva and Sukkar do not implement a first quality level for the first 
acoustic models and a second quality level for the second acoustic models. Huang teaches a 
method and system for speech recognition using continuous density hidden Markov models, 
which implements context-dependent HMMs and context-independent HMMs and teaches that 
the use of both types of HMMs is beneficial in achieving an improved recognition accuracy (col. 
6, lines 18-38). 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva and Sukkar to implement both context-dependent HMMs and 
context-independent HMMs, as by Huang, for the purpose of achieving an improved recognition 
accuracy, as suggest by Huang. 

Regarding claim 35, Alleva and Sukkar do not specifically teach a language model 
database that stores data describing the structure and sequence of words and phrases. Huang 
teaches a language model that represents linguistic expressions and describes the implementation 
of language model in predicting the likelihood of occurrence of a word considering the words 
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that have been uttered (col. 14, lines 35-54) and teaches the system is beneficial in improving the 
recognition capability of a speech recognition system. 

Therefore, it would have been obvious to one of ordinary skill at the time of the invention 
to modify the system of Alleva and Sukkar to implement language models in predicting 
likelihoods of word occurrences, as taught by Huang, for the purpose of improving recognition 
capability of the speech recognizer. 

Response to Arguments 

8. Applicant's arguments with respect to claims 13-36 have been considered but are moot in 
view of the new ground(s) of rejection. 

9. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Angela A. Armstrong whose telephone number is 703-308-6258. 
The examiner can normally be reached on Monday-Thursday 7:30-5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (703) 305-9645. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 



Angela A. Armstrong 

Examiner 

Art Unit 2654 



AAA 

February 7, 2004 




